fix(websocket): Fix websocket client race on abort and memory leak(IDFGH-16555) #924

gabsuren · 2025-10-28T10:16:03Z

TODO - Will remove comments after the review ( left it for easier review)

Description

This PR fixes critical memory leaks and crashes in the ESP WebSocket client that occur during reconnection scenarios(CONFIG_ESP_WS_CLIENT_SEPARATE_TX_LOCK = y).

Double-free crashes: Heap corruption during abort/reconnect scenarios
Data loss: First packet after reconnection not received
Error buffer accumulation: 2KB memory leak on disconnect

Changes Made:

Add state check in abort_connection to prevent double-close
Fix memory leak: free errormsg_buffer on disconnect
Reset connection state on reconnect to prevent stale data
Implement lock ordering for separate TX lock mode
Added sdkconfig.ci.tx_lock conf

Checklist

Before submitting a Pull Request, please ensure the following:

🚨 This PR does not introduce breaking changes.
[ ✓ ] All CI checks (GH Actions) pass.
[ ✓] Documentation is updated as needed.
Tests are updated or added as necessary.
[ ✓] Code is well-commented, especially in complex areas.
[ ✓] Git history is clean — commits are squashed to the minimum necessary.

Note

Hardens WebSocket client abort/reconnect and I/O paths: guard double-close, free error buffer, fix lock ordering (tx_lock) incl. PING/PONG, reset connection state on connect, and null transports; adds CI config for separate TX lock.

WebSocket client robustness
- Add state guard in esp_websocket_client_abort_connection() to avoid double-close; dispatch disconnect; free errormsg_buffer and reset its size.
- On successful connect, reset payload_len/offset, last_fin, and last_opcode to prevent stale state.
Locking and I/O fixes
- esp_websocket_client_send_with_exact_opcode() and recv path: refine error handling to call abort_connection with proper lock ordering when CONFIG_ESP_WS_CLIENT_SEPARATE_TX_LOCK is enabled; add debug logging.
- PING→PONG handling: release client->lock, acquire tx_lock with timeout, re-acquire client->lock, validate state/transport before sending PONG; ensure rx_buffer is freed on early returns.
- In the task loop: protect recv with client->lock; lock around poll-read error abort.
Cleanup
- After esp_transport_list_destroy(), set client->transport_list and client->transport to NULL.
Examples/CI
- Add examples/target/sdkconfig.ci.tx_lock enabling CONFIG_ESP_WS_CLIENT_SEPARATE_TX_LOCK and lock timeout.

^{Written by Cursor Bugbot for commit f474654. This will update automatically on new commits. Configure here.}

CLAassistant · 2025-10-28T10:16:10Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

components/esp_websocket_client/esp_websocket_client.c

+#else
+        // When separate TX lock is not configured, we already hold client->lock
+        // which protects the transport, so we can send PONG directly
+        esp_transport_ws_send_raw(client->transport, WS_TRANSPORT_OPCODES_PONG | WS_TRANSPORT_OPCODES_FIN, data, client->payload_len,


components/esp_websocket_client/esp_websocket_client.c

euripedesrocha · 2025-11-05T07:36:33Z

components/esp_websocket_client/esp_websocket_client.c

            }
+            ESP_LOGD(TAG, "Calling abort_connection due to send error");
+#ifdef CONFIG_ESP_WS_CLIENT_SEPARATE_TX_LOCK
+            xSemaphoreGiveRecursive(client->tx_lock);


It is better to move this verification to abort connection function.

@euripedesrocha I am not sure about it, as abort connection is used in 5 different places, if we move it inside abort_connection it adds complex detection logic "which lock am I holding?" Only 1 place (send error path) needs lock switching (tx_lock → client->lock)

components/esp_websocket_client/esp_websocket_client.c

- Add state check in abort_connection to prevent double-close - Fix memory leak: free errormsg_buffer on disconnect - Reset connection state on reconnect to prevent stale data - Implement lock ordering for separate TX lock mode - Added sdkconfig.ci.tx_lock config

gabsuren added the websocket label Oct 28, 2025

This comment was marked as outdated.

Sign in to view

gabsuren changed the title ~~Fix/ws race on abort~~ fix(websocket): Fix websocket client race on abort and memory leak(IDFGH-16555) Oct 28, 2025

gabsuren force-pushed the fix/ws_race_on_abort branch 3 times, most recently from 67bd7e3 to 46871bf Compare October 28, 2025 13:09

github-advanced-security bot found potential problems Oct 28, 2025

View reviewed changes

gabsuren requested a review from david-cermak October 29, 2025 09:09

gabsuren force-pushed the fix/ws_race_on_abort branch from 46871bf to 5577e03 Compare October 29, 2025 10:54

This comment was marked as outdated.

Sign in to view

gabsuren force-pushed the fix/ws_race_on_abort branch 3 times, most recently from ca2956e to 0e58789 Compare October 30, 2025 10:53

euripedesrocha reviewed Nov 5, 2025

View reviewed changes

gabsuren force-pushed the fix/ws_race_on_abort branch from 0e58789 to 62925a5 Compare November 10, 2025 10:13

cursor bot reviewed Nov 10, 2025

View reviewed changes

components/esp_websocket_client/esp_websocket_client.c Show resolved Hide resolved

gabsuren force-pushed the fix/ws_race_on_abort branch from 62925a5 to 15dcb35 Compare November 10, 2025 10:24

gabsuren force-pushed the fix/ws_race_on_abort branch from 15dcb35 to f474654 Compare November 10, 2025 10:27

gabsuren requested a review from euripedesrocha November 10, 2025 10:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(websocket): Fix websocket client race on abort and memory leak(IDFGH-16555) #924

fix(websocket): Fix websocket client race on abort and memory leak(IDFGH-16555) #924

Uh oh!

gabsuren commented Oct 28, 2025 •

edited by cursor bot

Loading

Uh oh!

CLAassistant commented Oct 28, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

Check warning

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

euripedesrocha Nov 5, 2025

Uh oh!

gabsuren Nov 10, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix(websocket): Fix websocket client race on abort and memory leak(IDFGH-16555) #924

Are you sure you want to change the base?

fix(websocket): Fix websocket client race on abort and memory leak(IDFGH-16555) #924

Uh oh!

Conversation

gabsuren commented Oct 28, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related

Checklist

Uh oh!

CLAassistant commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Check warning

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

euripedesrocha Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

gabsuren Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gabsuren commented Oct 28, 2025 •

edited by cursor bot

Loading

CLAassistant commented Oct 28, 2025 •

edited

Loading